Kim Johnson
3/25/2020
Let’s see how partial likelihood (PL) estimation works. The goals of this exercise are: 1) to understand Cox regression and PL; and 2) to get a feel for the general procedure of ML estimation
-Step 1: Develop PL function - Sort data in ascending order of T (survival times) - The hazard for individual 1 is:
The likelihood for individual 1 to have the event at time t is the “hazard for individual 1 at time t/(hazard for the risk set at time t)” or
The likelihood is equal to the hazard for case 1 at time t divided by the sum of the hazards for all cases who were at risk of having the event at time t.
and defines the likelihood for a censoring case (e.g. case 21, dead=0) as
-Step 2: Plug in starting value of \(\beta\)
-Step 3: Search the best \(\beta\) until you maximize the log PL
The Wald test (recommended)
The partial likelihood ratio test: G=2[L\(_{p}\)(M1)-L\(_{p}\)(M0)] where L\(_{p}\)(M1) is the log likelihood of the model containing the covariates being tested and L\(_{p}\)(M0) is the log likelihood of the null model. We can also use this test to compare models as we have done previously. The degrees of freedom to perform a chi-square test is the number of test covariates
HR=exp(\(\widehat{\beta}\))
95% confidence interval
#install.packages("survival") #for survival analysis by group
#install.packages('ggfortify') #for survival analysis by group
#install.packages("survminer") #for pairwise diffs
#install.packages("readxl") #for importing excel datasets
#install.packages("tidyverse")
#install.packages("lmtest")
#install.packages("MASS")
library(survminer)#for pairwise diffs## Loading required package: ggplot2
## Loading required package: ggpubr
## Loading required package: magrittr
library(survival) #for calculating KM values
library(MASS)#for log log survival curves
library(ggfortify) #for KM curves
library(readxl) # for reading in excel file
library(ggplot2) # for plotting KM curve
library(tidyverse) # for various packages## ── Attaching packages ──────────────────────────────────── tidyverse 1.3.0 ──
## ✔ tibble 2.1.3 ✔ dplyr 0.8.4
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ✔ purrr 0.3.3
## ── Conflicts ─────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks magrittr::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ✖ purrr::set_names() masks magrittr::set_names()
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
leuk <- read.dta("http://web1.sph.emory.edu/dkleinb/allDatasets/surv2datasets/anderson.dta")# leukemia remission dataset with sex variable found on the internet. Please note in this version of the dataset some of the variable names have changed
#label the group variable
leuk$rx<-factor(leuk$rx,
levels = c(0,1),
labels = c("Treated", "Untreated"))
leuk$id <- rownames(leuk)treat.mod<-coxph(Surv(survt, status)~rx, leuk, ties="efron") #using ties = Efron, default is Efron, which is fine but this is how it would be changed.
summary(treat.mod)## Call:
## coxph(formula = Surv(survt, status) ~ rx, data = leuk, ties = "efron")
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.5721 4.8169 0.4124 3.812 0.000138 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.817 0.2076 2.147 10.81
##
## Concordance= 0.69 (se = 0.041 )
## Likelihood ratio test= 16.35 on 1 df, p=5e-05
## Wald test = 14.53 on 1 df, p=1e-04
## Score (logrank) test = 17.25 on 1 df, p=3e-05
#Interpretation: Those who were not treated had a 4.82 (95% CI 2.15-10.81) times higher hazard of relapse than those who were not treated.
treat.mod1<-coxph(Surv(survt, status)~rx, leuk, ties="breslow") #using ties = Breslow--the default for SAS
summary(treat.mod1)## Call:
## coxph(formula = Surv(survt, status) ~ rx, data = leuk, ties = "breslow")
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.5092 4.5231 0.4096 3.685 0.000229 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.523 0.2211 2.027 10.09
##
## Concordance= 0.69 (se = 0.041 )
## Likelihood ratio test= 15.21 on 1 df, p=1e-04
## Wald test = 13.58 on 1 df, p=2e-04
## Score (logrank) test = 15.93 on 1 df, p=7e-05
## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc, data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.3861 3.9991 0.4248 3.263 0.0011 **
## logwbc 1.6909 5.4243 0.3359 5.034 4.8e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 3.999 0.2501 1.739 9.195
## logwbc 5.424 0.1844 2.808 10.478
##
## Concordance= 0.852 (se = 0.04 )
## Likelihood ratio test= 46.71 on 2 df, p=7e-11
## Wald test = 33.6 on 2 df, p=5e-08
## Score (logrank) test = 46.07 on 2 df, p=1e-10
## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + logwbc *
## rx, data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 2.3749 10.7500 1.7055 1.393 0.164
## logwbc 1.8724 6.5040 0.4514 4.148 3.35e-05 ***
## rxUntreated:logwbc -0.3175 0.7280 0.5258 -0.604 0.546
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 10.750 0.09302 0.3799 304.16
## logwbc 6.504 0.15375 2.6850 15.75
## rxUntreated:logwbc 0.728 1.37372 0.2597 2.04
##
## Concordance= 0.851 (se = 0.041 )
## Likelihood ratio test= 47.07 on 3 df, p=3e-10
## Wald test = 32.39 on 3 df, p=4e-07
## Score (logrank) test = 49.86 on 3 df, p=9e-11
## Likelihood ratio test
##
## Model 1: Surv(survt, status) ~ rx
## Model 2: Surv(survt, status) ~ rx + logwbc
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 1 -85.008
## 2 2 -69.828 1 30.361 3.587e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# from http://www.sthda.com/english/wiki/cox-proportional-hazards-model
# Create the new data for plotting adjusted survival curves for each treatment group using log_WBC set at the mean
trt_df <- with(leuk, data.frame(rx = c("Untreated", "Treated"), logwbc=rep(mean(logwbc, na.rm = TRUE),2)))
trt_df## rx logwbc
## 1 Untreated 2.930238
## 2 Treated 2.930238
#problem with survminer ggsurvplot function that won't allow it to take model objects solved with code below. see: https://github.com/kassambara/survminer/issues/324
fit<-survfit(treat_adj.mod, newdata = trt_df)
fit$call$formula <- eval(fit$call$formula)
ggsurvplot(fit, data=leuk, conf.int = TRUE, legend.labs=c("Untreated", "Treated"), ggtheme = theme_minimal()) #change X-axis limits for fun
ggsurvplot(fit, data=leuk, conf.int = FALSE, legend.labs=c("Untreated", "Treated"), xlim = c(0, 35), ylim= c(0,1), ggtheme = theme_minimal()) #We can see from these curves after adjusting for log_WBC, that at almost all time points there is a higher survival probability in the treated group than in the untreated group.
#what about unadjusted KM curves--how do they compare?
leukemia.surv <- survfit(Surv(survt, status)~rx, leuk)
ggsurvplot(leukemia.surv, data = leuk, conf.int=FALSE, ggtheme = theme_minimal(), tables.theme = clean_theme())# Need to rerun the cox model with sex in it:
sex_treat_adj.mod<-coxph(Surv(survt, status)~rx + logwbc + sex, leuk)
summary(sex_treat_adj.mod)## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex, data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.5036 4.4978 0.4615 3.258 0.00112 **
## logwbc 1.6819 5.3760 0.3366 4.997 5.82e-07 ***
## sex 0.3147 1.3698 0.4545 0.692 0.48872
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.498 0.2223 1.8204 11.113
## logwbc 5.376 0.1860 2.7794 10.398
## sex 1.370 0.7300 0.5621 3.338
##
## Concordance= 0.851 (se = 0.041 )
## Likelihood ratio test= 47.19 on 3 df, p=3e-10
## Wald test = 33.54 on 3 df, p=2e-07
## Score (logrank) test = 48.01 on 3 df, p=2e-10
# Create the new data for plotting adjusted survival curves for each treatment group using log_WBC set at the mean
sex_trt_df <- with(leuk, data.frame(sex=c(0,1), rx=1, logwbc=rep(mean(logwbc, na.rm = TRUE),2))) #"The curve(s) produced will be representative of a cohort whose covariates correspond to the values in newdata" (https://stat.ethz.ch/R-manual/R-patched/library/survival/html/survfit.coxph.html)
sex_trt_df## sex rx logwbc
## 1 0 1 2.930238
## 2 1 1 2.930238
#here is the code again to fix the error noted above with ggsurvplot
fit2<-survfit(sex_treat_adj.mod, newdata = sex_trt_df)## Warning in model.frame.default(data = structure(list(sex = c(0, 1), rx = c(1, :
## variable 'rx' is not a factor
fit2$call$formula <- eval(fit2$call$formula)
ggsurvplot(fit2, data=leuk, conf.int = FALSE, legend.labs=c("Sex=0", "Sex=1"), ggtheme = theme_minimal()) #compare to unadjusted KM curves for sex
leukemia.surv <- survfit(Surv(survt, status) ~ sex, leuk)
#Using survminer library to calculate KM plots with confidence intervals
ggsurvplot(leukemia.surv, data = leuk, ggtheme = theme_minimal(), tables.theme = clean_theme())#check for linearity of the log_WBC term
log_wbc.times.lwbc3<- leuk$lwbc3 * log(leuk$lwbc3)#create term to test linearity
boxTidwelllwbc3 <- coxph(Surv(survt, status)~rx + logwbc + sex + log_wbc.times.lwbc3, leuk)
summary(sex_treat_adj.mod) #Box Tidwell technique, test the assumption of linearity## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex, data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.5036 4.4978 0.4615 3.258 0.00112 **
## logwbc 1.6819 5.3760 0.3366 4.997 5.82e-07 ***
## sex 0.3147 1.3698 0.4545 0.692 0.48872
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.498 0.2223 1.8204 11.113
## logwbc 5.376 0.1860 2.7794 10.398
## sex 1.370 0.7300 0.5621 3.338
##
## Concordance= 0.851 (se = 0.041 )
## Likelihood ratio test= 47.19 on 3 df, p=3e-10
## Wald test = 33.54 on 3 df, p=2e-07
## Score (logrank) test = 48.01 on 3 df, p=2e-10
## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex + log_wbc.times.lwbc3,
## data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.51094 4.53101 0.46119 3.276 0.00105 **
## logwbc 1.77287 5.88773 0.53783 3.296 0.00098 ***
## sex 0.31915 1.37595 0.45557 0.701 0.48359
## log_wbc.times.lwbc3 -0.06612 0.93602 0.30445 -0.217 0.82807
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.531 0.2207 1.8350 11.19
## logwbc 5.888 0.1698 2.0518 16.89
## sex 1.376 0.7268 0.5634 3.36
## log_wbc.times.lwbc3 0.936 1.0684 0.5154 1.70
##
## Concordance= 0.85 (se = 0.04 )
## Likelihood ratio test= 47.23 on 4 df, p=1e-09
## Wald test = 33.73 on 4 df, p=8e-07
## Score (logrank) test = 48.68 on 4 df, p=7e-10
#check for influential observations
sex_treat_adj.mod<-coxph(Surv(survt, status)~rx + logwbc + sex, leuk)
summary(sex_treat_adj.mod)## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex, data = leuk)
##
## n= 42, number of events= 30
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.5036 4.4978 0.4615 3.258 0.00112 **
## logwbc 1.6819 5.3760 0.3366 4.997 5.82e-07 ***
## sex 0.3147 1.3698 0.4545 0.692 0.48872
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.498 0.2223 1.8204 11.113
## logwbc 5.376 0.1860 2.7794 10.398
## sex 1.370 0.7300 0.5621 3.338
##
## Concordance= 0.851 (se = 0.041 )
## Likelihood ratio test= 47.19 on 3 df, p=3e-10
## Wald test = 33.54 on 3 df, p=2e-07
## Score (logrank) test = 48.01 on 3 df, p=2e-10
#Compare to model without influential id, pick 22, the HR should go up for sex because the dfbeta residual is negative
sex_treat_adj.mod<-coxph(Surv(survt, status)~rx + logwbc + sex, leuk[which(leuk$id!=22),])
summary(sex_treat_adj.mod)## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex, data = leuk[which(leuk$id !=
## 22), ])
##
## n= 41, number of events= 29
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.6068 4.9868 0.5136 3.129 0.00176 **
## logwbc 1.5646 4.7807 0.3489 4.484 7.32e-06 ***
## sex 0.4956 1.6415 0.4901 1.011 0.31191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.987 0.2005 1.8224 13.645
## logwbc 4.781 0.2092 2.4127 9.473
## sex 1.642 0.6092 0.6281 4.290
##
## Concordance= 0.849 (se = 0.041 )
## Likelihood ratio test= 45.46 on 3 df, p=7e-10
## Wald test = 32.55 on 3 df, p=4e-07
## Score (logrank) test = 47.25 on 3 df, p=3e-10
#Compare to model without influential id, pick 38, the HR should go down for sex because the dfbeta residual is positive
sex_treat_adj.mod<-coxph(Surv(survt, status)~rx + logwbc + sex, leuk[which(leuk$id!=37),])
summary(sex_treat_adj.mod)## Call:
## coxph(formula = Surv(survt, status) ~ rx + logwbc + sex, data = leuk[which(leuk$id !=
## 37), ])
##
## n= 41, number of events= 29
##
## coef exp(coef) se(coef) z Pr(>|z|)
## rxUntreated 1.4268 4.1652 0.4712 3.028 0.00246 **
## logwbc 1.8252 6.2038 0.3552 5.138 2.78e-07 ***
## sex 0.1747 1.1909 0.4744 0.368 0.71269
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## rxUntreated 4.165 0.2401 1.654 10.489
## logwbc 6.204 0.1612 3.092 12.446
## sex 1.191 0.8397 0.470 3.017
##
## Concordance= 0.865 (se = 0.04 )
## Likelihood ratio test= 48.45 on 3 df, p=2e-10
## Wald test = 33.22 on 3 df, p=3e-07
## Score (logrank) test = 48.54 on 3 df, p=2e-10